Annotating gene functions with integrative spectral clustering on microarray expressions and sequences.

نویسندگان

  • Limin Li
  • Motoki Shiga
  • Wai-Ki Ching
  • Hiroshi Mamitsuka
چکیده

Annotating genes is a fundamental issue in the post-genomic era. A typical procedure for this issue is first clustering genes by their features and then assigning functions of unknown genes by using known genes in the same cluster. A lot of genomic information are available for this issue, but two major types of data which can be measured for any gene are microarray expressions and sequences, both of which however have their own flaws. Thus a natural and promising approach for gene annotation is to integrate these two data sources, especially in terms of their costs to be optimized in clustering. We develop an efficient gene annotation method with three steps containing spectral clustering over the integrated cost, based on the idea of network modularity. We rigorously examined the performance of our proposed method from three different viewpoints. All experimental results indicate the performance advantage of our method over possible clustering/classification-based approaches of gene function annotation, using expressions and/or sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating Gene Functions by Spectral Clustering for Combining Gene Expressions and Sequences

Annotating gene functions is a fundamental issue in the post-genomic era. A typical procedure for this issue is first clustering genes and then assigning functions of unknown genes by using known genes in the same cluster. A lot of genomic information are available for this issue, but two major types of data which can be measured for any genes are microarray expressions and sequences, both of w...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Integration and Reduction of Microarray Gene Expressions Using an Information Theory Approach

The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...

متن کامل

Analysis of Gene Expression Data Using Biclustering Algorithms

One of the main research areas of bioinformatics is functional genomics; which focuses on the interactions and functions of each gene and its products (mRNA, protein) through the whole genome (the entire genetics sequences encoded in the DNA and responsible for the hereditary information). In order to identify the functions of certain gene, we should able to capture the gene expressions which d...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2010